Graphics

May 7, 2019

Better Graphics with ggplot

Grammar of Graphics

  • So far we have been using the “base” graphics of R.
  • Base graphics can look good, but it takes some work.
  • In this lesson, we’ll explore the ggplot2 package, which is part of the tidyverse.
  • ggplot2 graphics look better “out of the box”, and the syntax follows the tidyverse philosopy.

\(\Rightarrow\) Make a new RMarkdown document

  1. Make sure you are in your DataWorkshop project. If not, switch to it.
  2. In the File menu, choose New File > RMarkdown… and title a new document.
  3. Clicking OK will open a new RMarkdown document with some boilerplate code. Delete everything below line 7.
  4. Save the document as tidyplots.Rmd in the project directory DataWorkshop. After you save the file, it should appear in the Files tab, along with the file DataWorkshop.Rproj.

\(\Rightarrow\) Load the tidyverse

library(tidyverse)
## ── Attaching packages ──────────────────────────────────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 3.1.0       ✔ purrr   0.2.5  
## ✔ tibble  2.0.1       ✔ dplyr   0.8.0.1
## ✔ tidyr   0.8.2       ✔ stringr 1.3.1  
## ✔ readr   1.3.1       ✔ forcats 0.3.0
## ── Conflicts ─────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()

\(\Rightarrow\) look at mpg tibble

The tidyverse comes with a built-in tibble called mpg. To make it appear in the upper-right pane, type View(mpg) in the console. You can also just echo the name:

mpg
## # A tibble: 234 x 11
##    manufacturer model displ  year   cyl trans drv     cty   hwy fl    class
##    <chr>        <chr> <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr>
##  1 audi         a4      1.8  1999     4 auto… f        18    29 p     comp…
##  2 audi         a4      1.8  1999     4 manu… f        21    29 p     comp…
##  3 audi         a4      2    2008     4 manu… f        20    31 p     comp…
##  4 audi         a4      2    2008     4 auto… f        21    30 p     comp…
##  5 audi         a4      2.8  1999     6 auto… f        16    26 p     comp…
##  6 audi         a4      2.8  1999     6 manu… f        18    26 p     comp…
##  7 audi         a4      3.1  2008     6 auto… f        18    27 p     comp…
##  8 audi         a4 q…   1.8  1999     4 manu… 4        18    26 p     comp…
##  9 audi         a4 q…   1.8  1999     4 auto… 4        16    25 p     comp…
## 10 audi         a4 q…   2    2008     4 manu… 4        20    28 p     comp…
## # … with 224 more rows

\(\Rightarrow\) Review Exercise

Use the tidyverse data transformations to create a list of all the auto manufacturers, along with the average city mpg of their vehicles, sorted from most fuel-efficient to least.

using GGplot

Data Visualization

Most of this lesson is adapted from Chapter 3 of R for Data Science.

ggplot grammar

  • ggplot() creates a blank graph.
  • ggplot(mpg) associates the mpg data.
  • ggplot(mpg) + geom_point() adds a scatterplot to the graph, but we need to specify what variables to use.
  • aes() specifies which variables are represented by different properties of the graph (aesthetics).
  • There are lots of different geom’s. You can add layers to the plot with +’s.

Scatterplots

ggplot scatterplot

ggplot(mpg) + geom_point(aes(x=displ, y=hwy))

Coloring factors

ggplot(mpg) + geom_point(aes(x=displ, y=hwy, color = class))

Sizing dots

ggplot(mpg) + geom_point(aes(x=displ, y=hwy, color = class, size=cyl))

Manual settings

ggplot(mpg) + geom_point(aes(x=displ, y=hwy), color = "blue")

Manual settings (wrong: why?)

ggplot(mpg) + geom_point(aes(x=displ, y=hwy, color = "blue"))

\(\Rightarrow\) Scatterplot exercises

  1. Map a continuous variable to color, size, and shape.

  2. Map the same variable to multiple aesthetics.

  3. What does the stroke aesthetic do? What shapes does it work with? (Hint: use vignette("ggplot2-specs").)

  4. What happens if you map an aesthetic to something other than a variable name, like aes(color = displ < 5)? Note, you’ll also need to specify x and y.

Adding geometries

Facet Grid

ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = hwy)) + 
  facet_grid(drv ~ cyl)

Facet Grid (collapsed)

ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = hwy)) + 
  facet_grid(. ~ cyl)

Facet Wrap

ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = hwy)) + 
  facet_wrap(~ class, nrow = 2)

ggplot2 geometries

\(\Rightarrow\) In the lower-right pane, click on the Packages tab, then find the link for ggplot2. Clicking this link should bring up the package help pages; scroll down to the “G” section and observe all of the available geometries.

Recall: geom_point

ggplot(mpg) + 
  geom_point(aes(x = displ, y = hwy))

Replace geom_point with geom_smooth

ggplot(mpg) + 
  geom_smooth(aes(x = displ, y = hwy))

Combine geometries with +

ggplot(mpg) + 
  geom_point(aes(x = displ, y = hwy)) +
  geom_smooth(aes(x = displ, y = hwy))

Change the method

ggplot(mpg) + 
  geom_point(aes(x = displ, y = hwy)) +
  geom_smooth(aes(x = displ, y = hwy), method = "lm")

linetype aesthetic

ggplot(mpg) + 
  geom_smooth(aes(x = displ, y = hwy, linetype = drv))

\(\Rightarrow\) Geometry exercises

  1. Try to produce the following plot:

\(\Rightarrow\) Geometry exercises

  1. Try to produce the following plot:

\(\Rightarrow\) Geometry exercises

  1. Read ?facet_wrap. What does nrow do? What does ncol do? Why doesn’t facet_grid() have these arguments?

  2. Investigate the ggplot2 geometry functions (click on the ggplot2 link in the packages tab). What geom would you use to draw a line chart? A boxplot? A histogram? An area chart?

Bar graphs and more

\(\Rightarrow\) Explore the diamonds tibble

Try View(diamonds) in the console to see the contents of this built-in tidyverse data set.

diamonds
## # A tibble: 53,940 x 10
##    carat cut       color clarity depth table price     x     y     z
##    <dbl> <ord>     <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>
##  1 0.23  Ideal     E     SI2      61.5    55   326  3.95  3.98  2.43
##  2 0.21  Premium   E     SI1      59.8    61   326  3.89  3.84  2.31
##  3 0.23  Good      E     VS1      56.9    65   327  4.05  4.07  2.31
##  4 0.290 Premium   I     VS2      62.4    58   334  4.2   4.23  2.63
##  5 0.31  Good      J     SI2      63.3    58   335  4.34  4.35  2.75
##  6 0.24  Very Good J     VVS2     62.8    57   336  3.94  3.96  2.48
##  7 0.24  Very Good I     VVS1     62.3    57   336  3.95  3.98  2.47
##  8 0.26  Very Good H     SI1      61.9    55   337  4.07  4.11  2.53
##  9 0.22  Fair      E     VS2      65.1    61   337  3.87  3.78  2.49
## 10 0.23  Very Good H     VS1      59.4    61   338  4     4.05  2.39
## # … with 53,930 more rows

Bar charts

ggplot(diamonds) + 
  geom_bar(aes(x = cut))

Bar charts (segmented)

ggplot(diamonds) + 
  geom_bar(aes(x = cut, fill=clarity))

ggplot(diamonds) + 
  geom_bar(aes(x = cut, fill=clarity), position="fill")

Bar charts (grouped)

ggplot(diamonds) + 
  geom_bar(aes(x = cut, fill=clarity), position="dodge")

Boxplots

ggplot(diamonds) + 
  geom_boxplot(aes(x = cut, y = price))

Using pipes

ggplot(diamonds) + 
  geom_boxplot(aes(x = cut, y = price))

does the same thing as:

diamonds %>% ggplot() + 
  geom_boxplot(aes(x = cut, y = price))

Filter, then pipe to ggplot

diamonds %>% filter(price<7500) %>% ggplot() + 
  geom_boxplot(aes(x = cut, y = price))

Density plots

ggplot(diamonds) + 
  geom_density(aes(x = price, fill=cut))

Density plots (with transparency)

ggplot(diamonds) + 
  geom_density(aes(x = price, fill=cut), alpha=0.3)

\(\Rightarrow\) More exercises

  1. Create a density plot for diamonds priced between 2500 and 7500, grouped by cut. Then make a histogram of the same thing.

  2. Try appending + coord_flip() to the end of a ggplot sum.

  3. Try appending + coord_polar() to the end of a ggplot sum for some of the above boxplots. Can you figure out how to make a traditional pie chart? (Not that you ever should.)

  4. Try appending + theme_classic(). Investigate other ggplot2 themes.

  5. Using the mpg data, make an appropriate chart showing the average city mpg of each manufacturer, in order.